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Abstract We propose predictive information, that is information between a 
long past of duration T and the entire infinitely long future of a time series, 
as a universal order parameter to study phase transitions in physical systems. 
It can be used, in particular, to study noncqulibrium transitions and other 
exotic transitions, where a simpler order parameter cannot be identifies us- 
ing traditional symmetry arguments. As an example, we calculate predictive 
information for a stochastic nonequilibrium dynamics problem that forms an 
absorbing state under a continuous change of a parameter. The information 
at the transition point diverges as oc log T, and a smooth crossover to oc T° 
away from the transition is observed. 



1 Introduction 

The theory of critical phenomena and the emergent notion of universality was 
one of the singular developments of physics in the twentieth century. With a 
known order parameter and symmetries of the problem, calculation of long- 
range, measurable behaviors of equilibrium physical quantities becomes a 
rather straightforward task. The success has turned out to be hard to repli- 
cate for non-equilibrium systems and systems where symmetry properties are 
similar in the phases on both sides of the transition [T]. Here it is often un- 
clear which quantity can serve as a good order parameter, and the developed 
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theoretical machinery does not apply. Where progress has been made, order 
parameters have been very specific, making it difficult to identify universal 
properties. For example, in reaction-diffusion problems with absorption, one 
commonly uses linear superposition of particle concentrations as order pa- 
rameters 2,3 , while particle current is a better choice for jamming problems 
0]. Further, the order parameters often have nontrivial relations to easily 
observable quantities. For example, phase transitions in some systems with 
dynamic heterogeneities often must be described with four-point correlation 
functions of particle densities [5] , or a multitude of correlation functions [6j 
[7]. Similarly, dynamical phase transition require one to study the space of 
trajectories instead of the state space [5] . 

Whatever the choice, the order parameter is a statistics averaged over a 
distribution of microscopic states. A continuous or discontinuous change in its 
value at a transition indicates a similar change in the underlying probability 
distribution. Therefore, it is natural to shift attention to the distribution 
itself, and, specific to nonequilibrium systems, to how it converges to the 
steady state. 

Intuitively, different phases (often with different symmetries) manifest 
themselves by changes in our ability to use local experimental measurements 
for long-range predictions. For example, nonzero magnetization in an Ising 
magnet allows us to predict with some certainty orientation of far away spins 
based on the value of the spin at the origin. Similarly, different crystalline 
phases of solids have different density autocorrelation functions, and hence 
existence of an atom at the origin translates into different predictions about 
the presence of an atom a certain distance away. Then instead of a specific 
statistics characterizing the predictability, namely the order parameter, it 
might be useful to study one's ability to use local measurements to predict 
states of the rest of the system directly. 

This prediction ability is naturally quantified using the language of Shan- 
non's information theory [5]. In previous work, we have termed it the predic- 
tive information [101111] . Briefly, in information theory, the total uncertainty 
in a system specified by a state x <G X, dimx = N, is measured by the 
(differential) entropy, 



Then observing a state of another variable y S Y, dimy = M, may reduce 
the uncertainty about x, and hence provide the information about it 



Importantly, depends on the entire probability distribution P(x, y), 

but not just on its specific statistics, and it is zero iff X and Y are statistically 
independent. 




(1) 




(2) 
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One can consider X and Y to be states of a physical process, such that 
X are the measured quantities, and Y are the quantities that one wants to 
predict [TU]. For example, X can be the state of spins on one segment of an 
Ising chain, and Y be the state of spins far away. Similarly, for time series and 
for noncquilibrium processes, X can be the past of the process of duration 
N, and Y part of its future of duration M. Then the information becomes 
the predictive information: 

I pied (N,M) = I[X;Y]. (3) 

Since the quantification of the intrinsic state of the system should not depend 
on which specific set of variables Y one wants to predict, it makes sense to 
define predictive information as 

I pred [X] = I pIed (N) = lim I(N,M). (4) 

M— >oo 

That is, one quantifies how much information the local observations X pro- 
vide about an entire, infinitely large physical system. 

Predictive information is subextensive, limjv->oo I pie d(N)/N = [TU] , 
It tends to a handful of universal behaviors for large systems, N — > oo, 
intuitively correlating with the complexity of the underlying physical pro- 
cess. In particular, lim^v-i-oo Ipred(N) = const indicates an easily predictable 
deterministic, or a short correlation length probabilistic dynamics ("sim- 
ple" long range prediction can be perfect, or it is impossible, respectively). 
Further, limjv-s-oo Ipved(N) oc logiV is indicative of a second order equilib- 
rium phase transition (power-law decaying correlations allow for complex, 
multiscale, partially predictable patterns over very long distances). Finally, 
limjv^oo Ipred(N) oc N a , a < 1 may correspond to more exotic phase tran- 
sitions with infinite-dimensional order parameters, but this case is not well 
understood. 

The dependence of / pro d on the full underlying probability distribution 
and the relation to phase transitions make it natural to explore 7 prc d as 
a "universal order parameter" , also useable in the nonequlibrium context. 
However, we are not aware of calculations of predictive information for non- 
stationary processes, where -P(x) is explicitly or implicitly time dependent. 
Further, even for equilibrium systems, the transition between J pro d = const 
and ip re d oc log N in the vicinity of a phase transition has not been studied. 

In this paper, we study predictive information in a context of a simple 
noncquilibrium, continuous-time Markov process, which ages and develops an 
absorbing state at a certain critical value of a parameter. This process can 
be viewed as a toy model, which is likely to possess some features of more 
complex systems. We calculate the expression for predictive information at 
the critical point and, for the first time for any system, near the critical 
point. The calculation reveals the need to modify the definition, Eq. ((4|), to 
remove an ultraviolet divergence emerging due to the continuous-time nature 
of the process. Similar modifications will likely allow extension of predictive 
information methodology to multidimensional systems. We demonstrate ex- 
plicitly the logarithmic divergence of / pre d at the transition, and we show 
that the divergent term in the information is insensitive to temporally local, 
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invertible transformations of the state space. This makes predictive informa- 
tion, and specifically its divergent term, a great candidate to characterize 
nonequlibrium phase transitions. 

2 The model 

We consider a Markovian system governed by the following Langcvin equa- 
tion: 



where (r)(t)r)(t' ')) = S(t — t'). We will treat this equation in the Ito sense. 
Without the noise term, x relaxes from the initial value xq to either or 
i-y/r, depending on if r > 0. The transition happens at r = 0. For large 
noise near x = (that is, small a), x gets kicked out from x w region, 
and the system equilibrates. For small noise (large a), a near-deterministic 
relaxation to the absorbing state at x = persists. This is probably the 
simplest example of nonequilibrium, stochastic relaxation dynamics, and it 
is a natural starting point for the analysis. 

We note that we can view this equation as describing dynamics of mag- 
netization, x, along a line normal to a boundary of an Ising fcrromagnet in 
some number of spatial dimensions. The coordinate is t — at the boundary, 
and increases into the bulk. The deterministic cubic dynamics in Eq. ([5]) is 
the usual coarse-grained model of such ferromagnet. In such a model, the 
variance of the noise increases with x, and a would depend on the overall 
dimensionality of the problem. 

To calculate predictive information, Eq. we discretize the time t, 
t n = nAt, and x n = x(t n ). We choose At — > 0, and yet N At = T p — s- oo, and 
M At = Tf — > oo, where p and f stand for past and future, respectively. Then 
Eq. is equivalent to the following Markovian dynamics: 

P(x„+i|x ,xi, ...,x„) = P(x„+i|x„) 



d t x{t) = -x(x 2 + r) + y/2a\x\ aj2 ^ 

x(t = 0) = xq, sampled from P(xq) = Pq, 



(5) 
(G) 




To simplify the notation, we define 

Pn\n— 1 — P{%n |^n— 1 



) 



(8) 





Then: 
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Not surprisingly for a Markovian process, predictive information is the 
mutual information between two successive measurements and does not de- 
pend on the length of the future sequence, M, so that the limit, Eq. (QJ, is 
trivial. However, the information can depend on N since the system is not 
stationary, and not time-translation invariant. Specifically, for small noise, 
each subsequent x is more narrowly distributed. This allows the information 
to increase unboundedly with N, unlike in typical finite-dimensional Markov 
processes with constant transition probabilities, where / prc d is always finite 
[lOj . These considerations also point out that one must take the sequence 
on TV observations starting from exactly the same time when calculating the 
averages. 

Since x(t) is continuous, xn — > xn-i as At — > 0. The state of the pro- 
cess at the next time step becomes exactly known, and predictive information 
diverges. However, this is a superficial ultraviolet divergence, while we are in- 
terested in studying the infrared behavior. Interestingly, this intcrfacial effect 
has been the primary reason behind the inability to apply predictive infor- 
mation ideas to systems in more than one dimension, where the size of the 
interface diverges with the system size. This makes it difficult to disambiguate 
divergences in predictive information coming from long-range prediction from 
those produced by short range intcrfacial effects. 

We thus need to introduce the cutoff scale into the system, at which 
predictive information is computed, similarly to how one does this in the 
rcnormalization group theory For this, we redefine predictive information as 
mutual information between the past of duration T p = N At and the future 
of duration Tf = MAT, separated by a "scale" gap of duration T s = LAT, 
which remains finite as AT 0. That is 



I pied {N, M\L) 
( !og 2 



M>ll n= l r n\n-l^N+L\N-lii m= N+L+l ^m\m-l 



Here 



p TT iv_i P P T-rJv-t-i.-t-M-i p 

■M) 1 l n= i ^n\n-UN+L I { m =N+L+l r m\m-l 

■Ml 

Pn+l 

.N+L-l N+L 



. Pn+l\n-i , 

lo S2 — n ) = I[x N+L ;x N -i\. (11) 



Pn+l\n-i 



f n dXn II P m\m-1- (12) 

" AT — M 



n=N m=N 



3 Invariance of predictive information 

From Eq. (fTTj) . it is clear that predictive information is invariant under repa- 
rameterization of x. This is a desired property for any potential universal 
order parameter. Further, any experimental device measuring x(t) will act 
as a temporal filter, so that the measured values will be convolutions of true 
x's at nearby time points. Thus it is also desirable for the nonequilibrium or- 
der parameter to be invariant to temporally local invertible transformations 
of data jlOj . Does the predictive information obey this property? 
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The filter, represented by J 7 , maps the sequences of true states of the 
system {x} into measured data {\}. We require that the filter does not inject 
additional information into the dynamics. This means that the extraneous 
parameters of the mapping J- must be known. In a real-life experiment, this 
means that we would like to be able to separate the behavior of the observed 
system from any artifacts associated with the experimental setup. 

In general terms, such filter can be represented by a convolution kernel 
C(t — t'). Since a convolution mixes the past and the future, the measured 
data {x} is no longer Markovian. We require that the so-introduced statistical 
dependences are short lived, i. e. the kernel C(t — t') is of compact support or 
decreases with time exponentially or faster. This is our definition of temporal 
locality. 

Convolutions are reductions in rank and therefore (potentially) invcrtiblc 
only for infinitely long data sequences. Therefore, we can define invertibility 
only in the t — > oo limit. To this end, let 53 = (g) n r be the space of all 
temporally discretized, finite length trajectories, that is the space of all n- 
tuples of x, n < oo. Let T : 03 — > 23 be a function such that J r (R N+ ") C 
WL N . That is, a sequence of N data points is defined from N + v points 
through some filtering procedure. We consider this mapping to be invertible 
if the Radon-Nikodym derivative over the set T~ x (x £ l w ) converges to a 
delta function for N — s- oo. More specifically, the probability of observing a 
trajectory {xi}fLi is given by 



J d N+ »xP{{ Xj }f=_„) f[ 5 (xj £ C(j k)x k j 



= I d N +»xd N \x 



x exp 




-J2c(j-k)x k )+ In P({ Xj }f=_ J 



(13) 



Thus invertibility requires that the Hessian matrix of the exponent in this 
equation diverges, defining a dominant stationary solution of the correspond- 
ing "action". With this requirement, {xi} are simply reparameterizations of 
{xj}, and predictive information is invariant under the change. While this 
requirement is very general, we suspect that, in practice, it will be equiva- 
lent to the asymptotic properties of trajectory-averaged quantities, for which 
there are already well established results [12] . We leave exploration of these 
conditions to future work. 



4 Solving the model 

To calculate predictive information in the model, we first calculate the Green's 
functions (the marginal and the conditional distributions) of Eq. ([S]). For this, 
we write the Fokker-Planck equation corresponding to the Langevin dynam- 
ics 

dtP{x, t) = D x [x(x 2 + r)p(x 7 1) + a 2 d x (\x\ a y {x, t))] . (14) 
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This equation immediately confirms our earlier statement that p(x, t) = S(x) 
is a stationary state, stability of which depends on the strength of the noise, 
which in turn is controlled by a. As a result, the equation can develop a 
singularity near x — 0. Fortunately, the probability current at x = is 
zero. Thus for xq > 0, we can consider x(t) > for any t. Further, we 
seek the solution for r > 0, hoping further to analytically continue to the 
entire real axis of r. With these caveats, we make the following simplifying 
transformations: 



f 



l//3fl/2 



tr/0, 
= yfV 2 = 
--y-^p(x(y),t), 
= 2/(o-2), 
= 2(a-l)/(a-2). 



Then Eq. (|T4")) becomes 

l d{f = -dy 



I) 



;3f 



(n-3) 



-y 



(15) 

(16) 
(17) 
(18) 
(19) 
(20) 

+ d y {y n -%f) ■ (21) 



The initial condition should obey p(y = 0, t) = p(y — > oo, t) = 0. The former 
condition is a result of the inverse relationship between x and y, while the 
latter is due to x = being the absorbing state. 

It is important to discuss the allowed values of a at this point. From 
Eq. (|2T)]) . n becomes divergent at a = 2. This corresponds to a large noise, 
which hides the phase transition. On the other hand, for large a, the noise 
is negligible, and the system is in an effectively deterministic regime. This 
happens at n < 3, where the second term in Eq. (|21j) is suppressed as f — > 0. 
Thus we are interested in 3 < n < oo, which corresponds to 2 < a < 4. In 
this regime, the f term in Eq. (|2ip is negligibly small, and can be dropped. 

With this, we notice that Eq. f|2 1 [) is the radial part of the diffusion 
equation in n dimensions. Thus our strategy is to solve it first for n integer, 
hoping to analytically continue to all n later on. Assuming an integer n, we 
rewrite Eq. ([2Tj) : 

1 



d'tf = ~nf -ydyf 



1 9y 

n—l y 



y 



(22) 



Therefore, f(y) is the radially symmetric part of the solution of the following 
equation 



9 { f = -nf - y • V/ + VV- 
We solve this equation in Appendix A, resulting in: 



(23) 



G(t,y,z) = C(n)z 



n-l 



27r(e 2f * - 1) 



^/2 



(y z - 2yze Tt X + zV Tt ) if (A), (24) 
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where K(x) is a kernel, which, for integer n, is the Jacobian of the n- 
dimcnsional change of variables from Cartesian to spherical coordinates. We 
still need to determine it for non-integer dimensions. For this, we substitute 
the expression of Eq. (IM1) in Eq. (|2"2"|) (for general n) and find that it satisfies 
iff given by 

dl[(l - X 2 )K(X)] + (n - l)d\(XK(X)) = (25) 

To guarantee regularity at A = ±1 (and in analogy with the integer dimen- 
sional cases), we additionally impose the condition that K(±l) = 0, leading 
to the solution 

K(X) = (1-A 2 )^. (26) 

The normalization constant C(n) can be determined from the requirement 
that the integral over y for a fixed z is unity when t — > 0. In the case of an 
integer n, C{n) is the area of the unit sphere in n — 1 dimensions. To verify 
this for any value n, we need to perform the integration explicitly. To this 
end, it is convenient to introduce A = [(e 2r * — l)/f] 1//2 , and z' = ze Tt . Then 
integrating Eq. (|2"1)) . we get 

G(t,y,z)dy = C(n)z n - 1 [ -=-) dyexp 1 



J (\/2^ : ) 1 -'M 1 - n exp 



2ttAJ Jo V 2A2 

yz'(l - A) 



A 2 



K(X) dX. (27) 



We concentrate on the inner integral first. We perform the substitution £ 
yz'(l — X)/A 2 which leads to 



r 2yz'/A 2 _ 

/ (yz')-^(v / 27) 1 -"e"« 
■Jo 



el>-4 

yz' 



By dominated convergence, the limit is valid for any y and all n > 1. (The 
cases 3 > n > 1 follow from the fact that £(2 - Z\ 2 £/yz') > £ for < £ < 
yz' /A 2 , while the portion of the integral in Eq. [28] between yz' /A 2 < £ < 
2yz'/A 2 converges to as A — > 0). Furthermore, since yz' j A 2 controls the 
convergence in a monotonic fashion, the limit is uniform on any semi-infinite 
interval not containing 0. Since the convergence is dominated by a multiple 
of (yz) - ^ 1 ^ 1 ^ 2 , particularly for the values of y close to zero, we recognize 
the outer integral in Eq. (|2"T)) as a delta function. Therefore, in order to bring 
the value of Eq. (|2"T|) to unity, we need that 

7r (n-l)/2 

c( "» = TW^WY (29) 

which is the area of the n — 1 dimensional unit sphere when n is integer. 

By reverting back to the original coordinate x, we can rewrite Eq. (|24[) 
and obtain the solution in these coordinates. However, for the purposes of 
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the next section, it is more convenient to stay in the y space instead. Notice 
that if we make the substitutions p = y~ al3 / 2 p in Eq. (|14D . we obtain 

dtP = ~d y Ury + ^V 1 + y 5 " 2 ")p) + ^d 2 p. (30) 

The advantage of p over / calculated earlier is that p is a probability distri- 
bution. We can immediately write its Green's function from Eq. (|24p since 
p(t,y) = y n - 1 f(t,y): 

G(t,l>,») = C(B)(»r 1 ( ^ ri y ) 1 x 

x y ^ dAcxp (- 2(e2f I_ 1) (y 2 - 2y*e w A + z 2 e 2ft )) 2f(A). (31) 

This is the main result of this section, which we will use in order calculate 
predictive information for our model. One can verify by explicit substitution 
that the expression in Eq. (|3"Tj) satisfies the Fokker-Planck equation, Eq. (|2"TT) . 
and it reduces to a delta function as t — ¥ 0. Thus it represents the conditional 
distribution of y given z. 



5 Predictive information for the model 

Predictive information is reparameterization invariant. Thus we can calculate 
it for y instead of x and use the expression, Eq. pip , when applying the 
Eq. (fTTj) to our model. Without loss of generality, we assume that the initial 
condition is a delta function. Then the continuous form of Eq. (fTTj) is 

W*) = (log 2 3?' Z) X 02) 
\ G(t + t,y,w)/ 

where w, z, and y are the values of the observable at times 0, t = (N — I) At, 

and T = t + t = (N + L)At respectively, i. e., w = Xq 1 ^ , z = a^y^f , and 

y = £jv+l- Equation (|32[) involves an integral with complex time and f- 
dependences. In the following, we would like to find the leading orders of 
these dependences. Defining A(t) = [(e 2ft - l)/^ 1 / 2 (cf. Eq. (Ell)), it is also 
convenient to introduce S(t; A, y, w) = cxp[(y 2 —2ywe Tt X+w 2 e 2rt ) / (2A(t) 2 )], 
so that Eq. (j3"Tj) takes on the form 

G(t,y,z) = C(n y > J ^ dXK(x)S(t;X,y,z). (33) 

Then Eq. ((35J becomes 

/pred(i) = n log 2 + ^log 2 J dXK{X)S{t; A, y, z) 

\og 2 J ^d\K{X)S{T;X,z,w)\. (34) 
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In Appendix C, we show that the last two terms in Eq. Q34[) are asymptotically 
constant when T — s- oo if t is large and f is small. Therefore, to the leading 
order, predictive information is 

r i \ , A(T) , exp[2f(i + <)1 - 1 

W*) « nlog 2 = nlog 2 g-l^L . (35) 

At the critical point, when the absorbing state is just starting to emerge, 
t —t 0, this expression reduces to 

Ipmdit) ~nlog 2 t -t^. (36) 

This logarithmic growth with the system size t has been anticipated for a 
critical point in Ref. [10] , but has not been calculated before for any nonequi- 
librium stochastic dynamical system. A plot of Eq. (|35[) is given for different 
parameter values in Fig. [5l 

Notice that the prefactor n = 2(a— l)/(a — 2) increases with the effect of 
the noise, which corresponds to more of partially predictable variability in the 
dynamics, and hence to an intuitively higher complexity. Further, as a — > 2, 
or n — > oo, the leading term in predictive information becomes extensive, and 
hence it would cancel out in the difference of entropies in Eq. ([2]), leading 
to ip r ed(£) — const. Equation (|35|) also allows calculation of the asymptotic 
away from the phase transition. For large negative f, I pre d(t) = const. For 
large positive r, I pre d(t) oc t, since perfect prediction is possible in the ab- 
sorbing state. Hence it cancels out as well, leading to the constant limit, and 
indicating the absence of the phase transition. These results illustrate that 
divergence of predictive information correctly captures the existence of the 
phase transition (emergence of the absorbing state) at r — > 0. 



6 Discussion 



Predictive information was introduced in Ref. [TU] as information between 
the past and the future of a time series, or between left and right parts of a 
physical system. It was argued, in particular, that the behavior of predictive 
information as the system size grows can signal existence of a phase transi- 
tion. As an example, Ref. |10j calculated the information numerically for an 
equilibrium long-range one-dimensional Ising magnet. In the current work, 
we argue that predictive information can be used as a universal order pa- 
rameter in more complicated scenarios, such as in nonequilibrium contexts, 
where traditional symmetry arguments fail to identify low-order correlation 
functions that can serve this role. For the first time, we calculate predictive 
information for a nonequilibrium Markov process, which exhibits a phase 
transition at certain values of parameters. Divergence of predictive informa- 
tion correctly captures this phase transition. In addition to results at and 
far away from the critical point, our calculations reveal how predictive infor- 
mation behaves near a phase transition, exhibiting a smooth crossover from 
an asymptotically constant to an asymptotically divergent regime. To our 



11 




knowledge, this has not been calculated before, either for equilibrium or for 
nonequlibrium systems. 

One important technical difference between this work and the previous 
ones is the introduction of an additional "renormalization" scale, L or t, in 
the definition of predictive information, so that the information is calculated 
between the past and the future that are separated by a finite distance. This 
removed the ultraviolet divergences associated with information at the inter- 
face between the past and the future of a trajectory. While this modification 
was precipitated by the continuous time/space nature of the stochastic pro- 
cess, we believe that it will solve additionally difficulties with application 
of predictive information ideas to systems with more than one dimension. 
Indeed, there the main problem is that the interface between two parts of a 
system diverges with the system size, and hence the interfacial contribution 
to predictive information diverges even away from a critical point. This will 
not happen if direct interfaces are eliminated. 

In summary, in this paper, we provide the first example of a direct ana- 
lytical calculation of predictive information for a nonequilibrium stochastic 
process. This example argues further for using predictive information as a 
universal order parameter for studying phase transitions. 
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A Calculating the Green's function 

Green's function of Eq. ()23[) is found easier in the Cartesian coordinates, and the ra- 
dial component can be extracted afterwards. Thus we look for the Green's function 
of the form 

n 

G(t;y,z)=l[G 1 (t;y i ,z i ) (37) 

i 

where G\(t;yi, Zi) is the one dimensional Green's function, satisfying 

d t Gi = -Gi - ydyd + 9|Gi + S(i, y-z). (38) 
To solve Eq. (|38p . it is convenient to consider Gi = e'Gi, where Gi satisfies 

9 t -Gi(f; y, f) - 0|Gi(t; y, 5) + y, 5) = tf(f, y-z). (39) 

As usual, we transform into Fourier space: 

icjGi +fc 2 Gi -dk{kGi) = e~ lks . (40) 
If we use the integral multiplier 

/x = exp(-(iajlnfc + fc 2 /2)) , (41) 

we obtain the following simplified form of Eq. (|40[) 

-d k {knGi) = f uT ih> . (42) 

Since we are looking for a smooth solution, we expect G = as fc — > oo. Therefore, 
the correct solution of the above equation is in the form 

roc . . /2 

G 1 (u,k,z) = k- 1 ^ 1 e~ lk s e- {lwlnk+k /2) dk'. (43) 

J k 

Inverting back to the time coordinate, we obtain 

2 foe „ 

G 1 {t,k,z) = e k ^k" 1 J e- lkz e~ k /2 8(t - Ink' + Ink) dk' . (44) 
Now performing the delta function integration, we are left with 

Gi(£,M) = e * 2 /V' e - lfe ^- fe2e2£ / 2 . (45) 

This is simply a Gaussian function, and the transformation back to the y coordinate 
leaves us with 



G 



({, y, z) = e-*Gi (£, g, I) = [27r(e 2t " - 1)] 1/2 exp f - I ^ ^ j . (46) 



We would like to extract the full dependence of the above solution on f . For nor- 
malization purposes, it is also convenient to multiply by f 1 ^ 2 . Thus rescaling back 
to the t and y coordinates results in 



27r ( e 2« _ !)y «- ^ 2 e 2 « - 1 

This finally results in an expression for the Green's function of Eq. Q38p . which 
in turn gives the Green's function of Eq. ()23[) in Cartesian coordinates. Now, to 
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obtain the solution of Eq. ((22]), we need to revert back to spherical coordinates. 
The resulting expression when n is integer suggests that we look for G(t, y, z) in 
the following form 

(\ n/2 
2^-i) j X 

£ rfAexp (- J t _ (y 2 - 2yze ft X + z 2 e 2ft )^j K(\). (48) 

Here K{x) is a kernel, which, for integer n, is the Jacobian of the n-dimensional 
change of variables from Cartesian to spherical coordinates. It is still undetermined 
for non-integer dimensions. 



B Identification of terms dominating convergence 

In the main text, we argued that it is justifiable to drop the y 4 ~" term in Eq. (I21|l . 
or equivalently, the y 5 ~ 2n term in Eq. (|30p . In essence, the bulk of the solution 
is supported away from y = 0, while this term is quickly suppressed for n > 
3. Without this (generally) non-integer power, we were able to calculate exactly 
predictive information for our model. Whatever the contribution s th e full solution 
might add, they are of lower order than the leading term in Eq. (|35[) . Nonetheless, 
this term is crucial since it keeps the full solution physical by guaranteeing its 
convergence faster than any power as y — > (x — > oo in the x space). In this 
appendix, we will make the arguments a bit more precise. 

Our approach is of the maximum principle type, which is employed abundantly 
in the theory of partial differential equations. We present the arguments in a general 
setting, not limited to the confines of our model. Our focus is on equations of the 
type 

d t F(t,y) = -g(y)d y F(t,y) + d 2 F(t,y), y > 0. (49) 

F is the cumulative probability f(t, y') dy of a distribution / satisfying a Fokker- 
Planck equation with constant noise and a force g(y)- We will assume that around 
y ~ 0, g is positive and behaves as l/y a with a > 1. We start by providing a sort 
of a zero value "eigenvector", i.e. a solution of the equation 

= -g(y)d y F (y) + d 2 y F Q (y). (50) 

It is straightforward to see that Eq. (|5Up is solved by 

F (y) = £ dy'exp dy"g(y")j , (51) 

where yo is any positive value. It follows that Fo(y) ~ exp( — l/y" -1 ), thus it 
converges to zero, together with all of its derivatives. 

The solution, Eq. (|50p . is non-normalizable, and it is, therefore, not a tru e 
eigenvector. However, we can use it to bound normalizable solutions of Eq. (14911 . 
That is, we will show that if initial conditions are bounded everywhere by a multiple 
of Fo (e. g., if their support does not include 0), then the solution F(t,y) remains 
bounded for all times, and it will, the refo re, have all derivatives zero at y = 0. This 
implies that the exact sol utio n of Eq. (|14[) indeed has a finite tail, and this is all due 
to the third term in Eq. (|30[) . By imposing the requirement that this term diverges 
faster than 1/y, we obtain n > 3, or equivalently a < 4. 

In order to demonstrate that F(t,y) < Fo(y) if F(0,y) < Fo(y), we will first 
show the following. 
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If F(t,y) satisfies the boundary conditions F(t,0) = and F(t,L) > 1/2, for 
some L > 0, together with the initial condition F(0, y) > 0, then F remains non- 
negative for all times if it also satisfies the following equation: 

d t F(t, y) = -jF(t, y) - g(y)d y F(t, y) + d 2 y F(t, y), j > 0. (52) 

Proof Assume a negative minimum of F(to,yo) < —e at some time to and point 
?/o- Clearly, < ya < L. Then, at yo: 

d t F(t , x ) = dyF(t , xo) - g(y )d y F(t , y ) 

- 1 F{t Q , y ) > dyF(t ,x ) +-ye>0. (53) 

This implies that there is a 5 > such that F < — e at some points y, for all 
to — 5 < t < to. Let t be the infimum of the set of all times for which F < — e 
at some point. Take a sequence {t n } which converges to t and a sequence {y n } 
such that F(t n , y n ) < — e- Since < {y n } < L, we can assume that it converges to 
some y / 0. Thus, F(t, y) < — e. By applying Eq. (|53p again, we obtain that this is 
possible only if t = 0, which, in turn, is impossible because of the initial conditions. 

Notice that the positivity of F immediately implies the positivity of F since 
there is a one-to-one mapping between the solutions of Eqs. (|49p and (|52p given by 
Fexp(-yt) = F. If we apply this to AF(t,y) = F (t,y) - F(t,y), then F (t,y) > 
F(t,y) for all times t, as long as this is true for t — 0, just as we claimed earlier. 
We end with a comment regarding the boundary condition requirement at y = L. 
If Fo is non-normalizable, then this condition is trivially satisfied. Otherwise, this 
condition is a byproduct of the uniqueness requirements of the solution. Therefore, 
the approximate solution, Eq. (|24[) , is an upper bound on the exact solution of 
Eq. CGi. 



C Bounding subleading terms in predictive information 

Whi le w e have not been able to obtain a closed form expression for all terms in 
Eq. (|34p . we can nonetheless provide asymptotic ally finite bounds on them. We will 
rely on the basic structure of the solution, Eq. (|33p . and repeated applications of 
the Jensen's inequality. 

Starting with the full expression in Eq. (|34p , we would like to start by providing 
the following bounds for z > and 6 > 1, i9 > 0: 

A{e,§) + B{e,'&)z 6 > / y 9 {y-zfe~ {y ~ z) /2 > a(0,0)z fl-1 +&(M)z 8 . (54) 
Jo 

Here A, B, a, b are positive functions of 6 and ■& only. It is useful to normalize the 
kernel K{\). Thus we define 

K = J K{\)d\ = 2 n -* r \ n 2 _[ y (55) 

where the last equality contains the usual G amm a function. We now can provide 
an upper bound on the integral terms in Eq. 1341 By using the fact that a;log(a;) is 
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a convex function, we obtain 



log 2 K 



C~ 1 (2^)" /2 ^log 2 | 1 i d\K(X)E(T;X,y,w)^-C- 1 (2nr / 



A n (T) J_ 

poo „ n — 1 

< K / 

J a 



d y-X^ d\^~(T; A, y, w) log 2 S(T; X, y, w) < 



A n (T) k 

< -(1/2) log 2 (e) £ dA-K"(A) a(n - 1, 2)X"~ 2 + 



(56) 



b(n -1,2)X" 



\A(T)J 



+ a(n - 1,0)(1- A 2 )A n ~ 2 



{a(T)J + 



fe(n-l,0)(l-A 2 )A"- 1 



; ?r w e 

g 



Similarly, utilizing the concavity of log (a;), we can write a lower bound on the 
expectation value 



C 



- 1 {2^ n/2 (\og 2 j 1 ^ dXK(X)S(T-X,y, W )^-C- 1 (27vy 
-(l/2)log 2 (e) J dXK(X) 



log 2 



A{n- 1,0) 



A(T) 



A(n + 1,0) + B{n+1, 0)X 

tT \ n + 1 



A(T) 



(57) 



+ B(n- 1,0)A™~ 



i / we 



A(T) 



1 — A 2 2tT 

n~. we 

g 2A 2 (T) 



Therefore, we have obtained bounds on the third term in Eq. ()34[) that are polyno- 
mial in e fT /A(T). The latter is, in turn, a bounded function of T = t + t. Indeed, it 
is straightforward to show that e TT /A(T) < ypT+ y/T/T. Therefore, these bounds 
are asymptotically constant (as T — > oo) and either 0(1) or C(y / ]?J). We can use 
these bounds on t he second term of Eq. ()34[) by simply replacing w by z and T by t 
in Eqs. (|56p and (|57[) . The resulting expressions need to be averaged over z, which 
requires estimating quantities of the form 

L < dz 



F 

Jo 



A n (t) \A{i) 
x J dXK(X) exp 



g 2/i«(t) 
1 

2Z\(^ 



(z 2 - 2zwXe Tt + wV Tt ) < U, (58) 



where m is a positive number. By using Eq. (|54[) again, we can obtain an upper 
and a lower bound on this expression. It is convenient to introduce rf = (1 — 

X 2 )e 2rt A 2 {t) / A 2 it). Then, after some algebra, we obtain the following two bounds: 
an upper bound 



U 



dXK(X) 



T 



2 l m/2 



1-A 2 
+B(n + m- 1,0) 



(1 + 7? 2 )-("+ m )/ 2 
Awe ft 



^(^(l + r? 2 )^ 



A(n + m- 1,0) + 

1 / A 2 \ » 2 e 2ft ' 

"2 l^TT^j^y 



(59) 
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and a lower bound 



d\K(X) 



-(n+m)/2 



1 — A 2 

a(n + m - 1, 0) + b(n + m - 1, 0) 



Xwe T 



A(t)(l +7? 2 ) 1 /2 



n+m— 2 



Awe T 



A(t)(l + rj 2 ) 1 / 2 



(60) 



x exp 



A 



I + 77 2 / A 2 (t) 



Notice that, for r > 0, J) -> oo as t -4 oo, while both bounds in Eqs. H59[) and 
1)6011 are of order 0{r}~ m / 2 ), therefore they are asymptotically constant. For f < 0, 
Eqs. (f59)l a nd ((60)) are controlled by C(|f| m/2 ). This implies that the second term 
in Eg (|34[) is a l so bounded around the critical point, independently of f . This 
completes the proof that the terms we dropped in Eq. (|34p do not contribute to 
the leading order of predictive information. 
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